Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RESOURCE] databricks_sql_permissions to manage data object access control lists #545

Merged
merged 19 commits into from
Apr 22, 2021

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Mar 2, 2021


subcategory: "Security"

databricks_sql_permissions Resource

-> Note This resource has an evolving API, which may change in the upcoming versions.

This resource manages data object access control lists in Databricks workspaces for things like tables, views, databases, and more. In order to enable Table Access control, you have to login to the workspace as administrator, go to Admin Console, pick Access Control tab, click on Enable button in Table Access Control section, and click Confirm. The security guarantees of table access control will only be effective if cluster access control is also turned on. Please make sure that no users can create clusters in your workspace and all databricks_cluster have approximately the following configuration:

resource "databricks_cluster" "cluster_with_table_access_control" {
  // ...

  spark_conf = {
    "spark.databricks.acl.dfAclsEnabled": "true",
    "spark.databricks.repl.allowedLanguages": "sql,python,r",
    "spark.databricks.cluster.profile": "serverless"
  }

  custom_tags = {
    "ResourceClass" = "Serverless"
  }
}  

Example Usage

The following resource definition will enforce access control on a table by executing the following SQL queries on a special auto-terminating cluster it would create for this operation:

  • SHOW GRANT ON TABLE `default`.`foo`
  • REVOKE ALL PRIVILEGES ON TABLE `default`.`foo` FROM ... every group and user that has access to it ...
  • GRANT MODIFY, SELECT ON TABLE `default`.`foo` TO `serge@example.com`
  • GRANT SELECT ON TABLE `default`.`foo` TO `special group`
resource "databricks_sql_permissions" "foo_table" {
    table = "foo"

    privilege_assignments {
        principal = "serge@example.com"
        privileges = ["SELECT", "MODIFY"]
    }

    privilege_assignments {
        principal = "special group"
        privileges = ["SELECT"]
    }
}

Argument Reference

The following arguments are available to specify the data object you need to enforce access controls on. You must specify only one of those arguments (except for table and view), otherwise resource creation will fail.

  • database - Name of the database. Has default value of default.
  • table - Name of the table. Can be combined with database.
  • view - Name of the view. Can be combined with database.
  • catalog - (Boolean) If this access control for the entire catalog. Defaults to false.
  • any_file - (Boolean) If this access control for reading any file. Defaults to false.
  • anonymous_function - (Boolean) If this access control for using anonymous function. Defaults to false.

privilege_assignments blocks

You must specify one or many privilege_assignments configuration blocks to declare privileges to a principal, which corresponds to display_name of databricks_group or databricks_user. Terraform would ensure that only those principals and privileges defined in the resource are applied for the data object and would remove anything else. It would not remove any transitive privileges. DENY statements are intentionally not supported. Every privilege_assignments has the following required arguments:

Available privilege names are:

  • SELECT - gives read access to an object.
  • CREATE - gives the ability to create an object (for example, a table in a database).
  • MODIFY - gives the ability to add, delete, and modify data to or from an object.
  • USAGE - do not give any abilities, but is an additional requirement to perform any action on a database object.
  • READ_METADATA - gives the ability to view an object and its metadata.
  • CREATE_NAMED_FUNCTION - gives the ability to create a named UDF in an existing catalog or database.
  • MODIFY_CLASSPATH - gives the ability to add files to the Spark class path.
  • ALL PRIVILEGES - gives all privileges (is translated into all the above privileges).

Import

The resource can be imported using a synthetic identifier. Examples of valid synthetic identifiers are:

  • table/default.foo - table foo in a default database. Database is always mandatory.
  • view/bar.foo - view foo in bar database.
  • database/bar - bar database.
  • catalog/ - entire catalog. / suffix is mandatory.
  • any file/ - direct access to any file. / suffix is mandatory.
  • anonymous function/ - anonymous function. / suffix is mandatory.
$ terraform import databricks_sql_permissions.foo /<object-type>/<object-name>

@nfx nfx requested a review from alexott March 2, 2021 14:44
@nfx nfx self-assigned this Mar 2, 2021
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, but test is still failing

@nfx
Copy link
Contributor Author

nfx commented Mar 3, 2021

@alexott it's that SP test - started failing after ADAL updated to a more recent version. The problem is that it tries now to go and see if there's MSI endpoint available and fail otherwise. Does not really affect the normal case scenarios with Azure SP, though. Will test with make test-azsp.

@codecov
Copy link

codecov bot commented Mar 3, 2021

Codecov Report

Merging #545 (b8b65e5) into master (62ca936) will increase coverage by 0.38%.
The diff coverage is 85.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #545      +/-   ##
==========================================
+ Coverage   82.15%   82.53%   +0.38%     
==========================================
  Files          78       79       +1     
  Lines        6786     7027     +241     
==========================================
+ Hits         5575     5800     +225     
- Misses        801      812      +11     
- Partials      410      415       +5     
Impacted Files Coverage Δ
common/reflect_resource.go 83.94% <0.00%> (-0.48%) ⬇️
compute/model.go 76.31% <ø> (ø)
qa/testing.go 70.92% <0.00%> (-0.77%) ⬇️
exporter/util.go 58.92% <50.00%> (-0.54%) ⬇️
access/resource_sql_permissions.go 84.57% <84.57%> (ø)
common/commands.go 88.88% <87.17%> (-11.12%) ⬇️
common/azure_auth.go 77.31% <100.00%> (ø)
common/http.go 85.19% <100.00%> (ø)
compute/commands.go 93.06% <100.00%> (+23.62%) ⬆️
provider/provider.go 95.59% <100.00%> (+0.01%) ⬆️
... and 3 more

access/resource_table_acl.go Outdated Show resolved Hide resolved
@nfx nfx added this to the v0.3.2 milestone Mar 5, 2021
@stikkireddy
Copy link
Contributor

hey @nfx does this use a shared cluster and what is the throughput of the changes? Quick question also around does parallelism cause any funky behavior when you are applying grants/denys? Typically in notebook everything is run sequentially.

@nfx
Copy link
Contributor Author

nfx commented Mar 26, 2021

@stikkireddy it would use the shared cluster (or we can specify one as parameter). resource will work on table level, so there should be no conflicts

@nfx nfx changed the title Table acls [RESOURCE] databricks_table_acl to manage data object access control lists Mar 27, 2021
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick pass. Looks good!

.lgtm.yml Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
access/resource_table_acl.go Outdated Show resolved Hide resolved
docs/resources/table_acl.md Outdated Show resolved Hide resolved
docs/resources/table_acl.md Outdated Show resolved Hide resolved
docs/resources/table_acl.md Outdated Show resolved Hide resolved
docs/resources/table_acl.md Outdated Show resolved Hide resolved
@nfx nfx modified the milestones: v0.3.2, v0.3.3 Apr 1, 2021
@nfx nfx requested review from pietern and alexott April 2, 2021 09:18
@nfx
Copy link
Contributor Author

nfx commented Apr 20, 2021

resource "databricks_sql_permissions" "x" {
    privilege_assignments {
        principal = "serge@example.com"
        privileges = ["SELECT", "READ", "MODIFY"]
    }
}

@nfx nfx changed the title [RESOURCE] databricks_table_acl to manage data object access control lists [RESOURCE] databricks_sql_permissions to manage data object access control lists Apr 21, 2021
@nfx nfx removed the request for review from alexott April 21, 2021 10:09
nfx added 3 commits April 21, 2021 13:10
* Fixed issue with putting back revoked permission
* Added acceptance test
* Enhanced documentation
@adamcain-db
Copy link

I looked through the changes and no obvious issues jump out. Thanks for this work, @nfx!

@nfx nfx requested a review from alexott April 22, 2021 08:44
@nfx nfx merged commit a652069 into master Apr 22, 2021
@nfx nfx deleted the table-acls branch April 22, 2021 08:59
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants